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1. INTRODUCTION 

Search engine becames one of functions or the most important tool on information system specially 
on-line system [1]. Search engine technology gives it easy for system user to get the information 
quickly [2]. Google is one of capable search engines but it still has limitations in analyzing the content and 
meaning of search results [3]. Along with advanced date regulation on the internet, search engines require 
speed and accuracy in releasing results in line with expectations today. The search function becomes important 
thing in getting information easily and quickly. However, not all search engines are devoted to find certain 
information precisely and accurately. In this study, a search engine that was built specifically to get information 
about the hadith in accordance with user needs. Where, the hadith is the second important source of law for 
Muslims after the Holy Qur'an [4, 5]. Of course, the generated hadith information must hand in hand with 
needed requirements. Therefore, search engines that are built need to consider the semantics wheather from 
the inputted keywords or the hadith data which is saved in the system. 
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Hadith collection in the form of text requires certain processes so that the meaning of the text is 
maintained [6]. Starting from preparing unstructured text data into structured data [7, 8]. Structured 
representation of text can be used in the next processes both in information retrieval (IR) and text mining [9]. 
In the study of obtained information search engine, it uses the information retrieval (IR) technique by 
combining the latent semantic analysis algorithm and cosine similarity. In contrast to text mining where 
the results obtained from the system are not clear yet, IR produces information that has actually been known 
its form, because it is the same as the collection of data held [10-12]. Information retrieval (IR) is used to 
connect relationships between large text data collections according to keywords. The parts of IR include: 

— Text operations (operations of text) which include the selection of words in keywords or documents 
(term selection) in the transformation of documents or keywords become term indexes (index of words). 

— Query formulation (formulation of keywords) that gives a standard to the word indexes of keyword. 

— Ranking (ranking), look for documents that are relevant to keywords and arrange the documents according 
to their compatibility with keywords. 

— Indexing (indexing), build a data base of indexes from document collections. Firstly, it is carried before 
searching documents. 

IR system accepts keywords from users, then ranks documents on collections based on their 
compatibility with keywords. The result of rank which is given to users is documents based on the system are 
relevant to keywords. But the relevance of documents to a keyword is a subjective judgment and it is influenced 
by many factors such as topics, timing, sources of information and the objective of users. 

Latent semantic analysis algorithm is widely used in processing text data by semantics approaches so 
the meaning of the text is maintained. Latent semantic analysis can be used not only for text summarization 
well [13—15], checking plagiarism [15], and automatically evaluating essays [16], of course it can also be used 
for searching. Latent semantic analysis compares the entered text with owned text data collection based on 
vector representations [17—19], with regard to semantics approaches to preserve the meaning of texts. 
In addition to latent semantic analysis, this hadith search engine research also uses cosine similarity to see the 
similarity of text data generated by search engines so that it can bring up text data sequences based on popularity 
as top order. Cosine similarity is one of the most popular similarity calculation methods to be applied to text 
documents [20]. The main advantage of the cosine similarity method is that it can’t be affect by 
the length and short of a document. Because the term value of each document is the important thing. Based on 
the explanation of the problem formulation above, how latent semantic analysis and cosine similarity can be 
implemented in finding the hadith text based on keywords entered correctly on the hadith search engine? Are 
latent semantic analysis and cosine similarity in the search engine can find hadith text data that are searched 
based on keywords that are entered correctly and relevant. 


2. RESEARCH METHOD 

Figure 1 describes activity flow of this research. Generally, this reseach used IR technique that 
implement latent semantic analysis and cosine similarity algorithm for producing information of hadiths based 
on input keywords. The activity begin from inputing the keywords (can be in the form of words, phrase, or 
sentence), the input keyword will be processed in text pre-processing phase to clean text data. Then, LSA 
agorithm will be conducted to create term document matrix and get the vector value of each document. Last, 
the similarity of input keywords and hadith data collection will be counted using cosine similarity. 


Information of Hadith End 
Input keywords 
Hadith Data Calculating Cosine Similarity value 
Collection 


Text Pre-processing: 
1. Tokenizing Conducting Latent Semantic Analysis: 





2. Casefolding 1. Creating term document matrix 
3. Filtering/Cleaning Data 2. Calculating Singular Value Decomposition 
4. Removing Stopwords 3. Calculating vector value from each document 
5. Stemming 


Figure 1. Research Activities 


TELKOMNIKA Telecommun Comput El Control, Vol. 18, No. 1, February 2020: 217 - 227 


TELKOMNIKA Telecommun Comput El Control O 219 





2.1. Latent semantic analysis (LSA) 

Latent semantic analysis is an algebraic method that extracts hidden semantic structures from words 
and sentences [21]. Latent semantic analysis algorithm is one of the development algorithms in the field of 
information retrieval that is able to collect a large number of documents in a data base and connect relationships 
between documents by matching the given input. The main function of this latent semantic analysis is to 
calculate the similarity of a text data by comparing vector representations from other text data [15]. The results 
of latent semantic analysis represent text data contextually and semantic that gives text meanings [21, 22]. 
The evaluation by using the latent semantic analysis method focuses on words in writing without considering 
to the order of words and grammar in writteng texts so that a sentence is assessed based on the key words 
include in the sentence [23]. Basically, latent semantic analysis extracts information from patterns or 
collections of words that often appear simultaneously in different sentences. If the sentence contains a 
collection of words that often appear in large numbers, the sentence has semantic or safe meaning [21]. 
Generally, the steps of latent semantic analysis that are used for text data, among others [24]: text 
pre-processing, creating term of document matrix, calculating singular value decomposition (SVD) and 
calculating vector value for each document 


2.1.1. Text pre-processing 

The text pre-processing stage is the stage to prepare text data which is unstructured data becomes 
a structured data representation [7, 25, 26]. The process starts from tokenization, deletes regular expressions, 
deletes non letter characters, deletes stop words, and stemming. In fact, if needed, it is carried out a special 
process to handle natural languages contained in text data, such as; abbreviations, slang, regional languages, 
and other natural languages. The discussion regarding text pre-processing will be explained further in 
section 3.2. 


2.1.2. Creating term of document matrix 

After carried out the pre-processing stage in the text data, then the term of document matrix is 
constructed by placing the word result of the stemming (term) process into the row. This matrix is called 
the term of document matrix. Each row represents a unique word, while each column represents the obtained 
word source. The source of the word can be sentences, paragraphs, or all parts of the text. The examples of 
the term of document matrix can be seen in Table 1 (that presented with Indonesian language). 
On the Table 1, the first row represents the word has passed the pre process until the stemming process is called 
stemmed term (the word as term 1, term 2, etc.), and the column represents the context, namely the text. 
The value is located in each cell on the table shows how the number of times in a term appears in a document. 
For instance, the term 1 appears 1 time at the firts document, and appears 2 times at the second document, but 
the term 1 does not appear at third document, and so on. 


Table 1. Matrix example for term of document 
Word Doc 1 Doc 2 Doc 3 
jangan (do not) 
kalian (you) 
dusta (lie) 
atas (on behalf) 
nama (name) 
niscaya (surely) 
masuk (enter) 
neraka (the hell) 
sungguh (actually) 
sengaja (expressly) 
tempat (place) 
duduk (seat) 
hendak (should) 








= 
© 


SB Eh OKO OH 


COCO ORR Hi RHR ji ji 
O OOOH p ORR RHE 





2.1.3. Calculating singular value decompsition and vector value for each document 

Singular value decomposition SVD is a linear algebra theorem which can split term of document 
matrix into three new matrices, those are: orthogonal matrix or left singular vector matrix (U), diagonal matrix 
or singular value matrix (S), and transpose of orthogonal matrix or right singular matrix (V) [27-29], 
formulated by (1) that illustrated in Figure 2. 


A = US VT (1) 
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Figure 2. SVD Illustration of (1) [30] 


The formula (1) is obtained from the U matrix which is a matrix of m x k size and a matrix V 
of n x k size, as illustrated in Figure 1, U and V which have orthogonal columns so that it ca be valid: 


UTU-VTV-1 (2) 


and S is a diagonal matrix of k x k size. The contents on the main diagonal of the S matrix are singular of 
the A matrix. The results of the SVD can be better understood if A matrix is written with a different 


pny 


the S matrix, and v, v» Vv, are column vectors of V matrix, A matrix can be written as shown in (3). 
— yk T 
A= Yin Gi Ur Vj 3) 


where the value of ol is for 1, fori = 1, 2, ..., k, on (3) it is sorted from the largest to the smallest. 
If some big values o, are taken and a small (near zero) 0 (1) value is discarded, we get an approximation 
from good A value. So, by using SVD, a matrix can be written as a sum of the components (v, v? for i = 1, 2, 
..., K), and its weight is the singular value (v,, for i= 1, 2, ... k, are taken from the formula of (4) [30]. 


oa O = 01m 
isé T 

Astuu, ll oo 02 7 0 (j2 (4) 
0 0 + odlot 


SVD can identify and arange dimensions that indicate which data variations often appear. SVD takes 
the term of document matrix which consists of words and documents as in Table 1 which has been broken 
down into linear independent components. The result of the SVD process is a vector that will be used to be 
calculated its similarity by an approach. 


2.1.4. Calculating cosine similarity 

Cosine similarity is used to calculate the cosine value between documents vector in a collection and 

the needed input vector [31, 32]. The smaller the produced, the higher the level of similarity of the essay occure. 

The formula of cosine similarity is as shown in (5): 
We capes 

Cos a= Tr = — Bese 

; [aban YL BD? 





(5) 


with the statement, it showed that A is a document vector, B is an input vector, A. B is the dot product of vector 
A with vector B, IAI is the length of vector A, IBI is the length of vector B, IAI. IBI is a cross product between 
|A| and |B| and a is the angel which is formed between vector A and vector B. 


3. RESULTS AND ANALYSIS 

In this section, it is explained the results of research and at the same time is given the comprehensive 
discussion about how LSA and CS are implemented in searching information of hadiths and present 
the evaluation result of experiment that conducted. 


3.1. Pre-processing for text data 

Text data is unstructured data that needs special treatment before caried out mining process or 
searching for information contained in the text [30]. The pre processing stage for text is the stage of preparing 
text data into a structured data representation. Generally, two types of structured data representations for text 


TELKOMNIKA Telecommun Comput El Control, Vol. 18, No. 1, February 2020: 217 - 227 


TELKOMNIKA Telecommun Comput El Control O 221 





are bag of words and multiple of words (33, 34]. Latent semantic analysis is one algorithm that produces 
structured text representations in the form of multiple of words. Where, the text is not only represented by 1 
word but also can be more than 1 word or also known as n-gram. Even the latent semantic analysis word 
collections considers to the semantics between one word and another. 

Pre-processing of text data starts from uniformity of the size of letters to lowercase, deleting characters 
other than letters and regular expressions, if it is necessary to change abbreviations to be their original form, 
delete unimportant words or stop word removal, then it is the process to change the initial words into words 
essentially or stemming. In this study, the stemming process uses the Nazief & Adriani algorithm because 
the hadith text documents are arranged in Indonesian. The Nazief & Adriani algorithm is the most commonly 
used stemming algorithm for Indonesian because it is in accordance with the syntax of Indonesian [35-39]. 
The results of the stemming used as data are entered for the latent semantic analysis and formed the term of 
document matrix from the text data. 


3.2. Implementation of latent semantic analyais and cosine similarity on the hadith search engines 

Latent semantic analyais is applied after the pre processof text is complete. Then the pre process 
results will be formed to be term of document matrix. The term of document matrix will be computed by SVD 
to produce a matrix of U, S, and V. The final stage is the application of cosine similarity to see the similarity 
of the information generated as well as arange it based on the level of similarity. The flow of the latent semantic 
analysis and cosine similarity that impemented in this study can be seen at the Figure 1. For instance, there are 
3 pieces of the following hadith documents (present in Indonesian language): 





Document 1: 

Janganlah kalian berdusta atas namaku, karena siapa yang berdusta atas namaku niscaya dia masuk 
neraka. 

(Do not lie on behalf of my name, because if anyone who lies on behalf of my name, he/she will go to the 
hell surely.) 

Document 2: 

Janganlah kalian berdusta terhadapku (atas namaku), karena barangsiapa berdusta terhadapku dia akan 
masuk neraka. 

(Do not lie to me (on my behalf), because whoever lies on me he will go to the hell.) 

Document 3: 

Barangsiapa yang sengaja melakukan kedustaan atas namaku, maka hendaklah dia menempati tempat 
duduknya dari neraka. 

(Whoever deliberately lies on behalf of my name, he should occupy his seat from the hell.) 


Input Keywords in Hadith Search Engine: 
Jangan Dusta Masuk Neraka 
(Do not lie to go to the hell) 











Text data from these three documents and go to the search engine. It will be caried out pre-proccess 
to produce text data as follows: 





Document 1: jangan kalian dusta atas nama dusta 

Document 2: jangan kalian dusta atas nama dusta masuk neraka 
Document 3: sengaja dusta atas nama hendak tempat duduk neraka 
Input keywords in hadith search engine: jangan dusta masuk neraka 











Then, the already three prepared text data is processed to form matrixes of the term of document likes 
on Table 1 and it is gained A matrixes as follows: 


> 

I 
So000 oh hh RRP RPE 
SCOCORRRORR RRB 
bb hh Oh SOKO 
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The main step that needs to be completed is to decompose A matrix to be 3 other matrices using SVD, 
starting from finding the ATA value to calculate with cosine similarity. The process of applying Laten 
Semantics Analysis and Cosine Similarity for the term of document matrix is in the following Table 1. 


Search the value of ATA: 


ATA= ( 


ORR 

ORR 

PRR 

PRR 

PRR 

So oh 

ORR 

RRR 

ORO 

b 00 

b 00 

b 00 

Roo 

NES 

R 
óoooOorrrrrerrnA 
Soo Ooh OK 
b Ph hb OK O OPO 

Il 

oS 

Sao 

ANN 

œ Ae 

— 


search determinant of ATA result, so IATA-AI|=0 : 
87 4 10 0 8—1 7 4 
aa-u=(6 7 4)-(0 3 o)- 6 7-i 4 
448 0 0A 4 4 8-A 


7—1X 4 


IATA — Al] = (8 — X) det( P PNE 


6 7-1 
gaa (7) det (4 aea ee) det ($ 4 ) 

IATA — Al] = [(7)(8 — 4 — (4)(4] - DH 798 - 4 — A] + (4) KOAH E7 -AA 
JATA — HI] = 23 + 2342 — 1020 + 80 = 0 


search eigen value and eigen vactor: 


Eigen Value: Eigen Vactor: 

A1= 17.40312 V1 = 1.24704, 1.10373, 1 
125 4.59687 V2= -0.54366, -0.30712, 1 
13-51 V3=-1, 1,0 


search singular matrix based on the value of eigen value which has been gained: 


S1 — V17.40312 =4.1717 


S2 5 V4.59687 — 2.14403 


S3 =V1=1 
S1 0 0 4.1717 0 0 
S= | 0 S2 0 ) = | 0 2.14403 o) 
0 0 s3 0 0 1 
0.23971 0 0 
S-1 4 0 0.46641 ) 
0 0 1 


search V matrix value by using value normalization of eigen vactor which has been gained: 





IV1| = V1.24704 + 1.103732 + 12= 1.94251 





|V2| = V—0.543667 + —0.307122 + 12= .17894 
[V3] = V—1? + 1? + 02= 1.41421 


1.24704 1.10373 











Vie 1 ,—— = 0.64197, 0.56819, 0.51479 
1.94251 ` 1.94251 © 1.94251 

yana ees -030712 40.461 14,2605 1 0.84822 
1.17894 1.17894 1.17894 
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_ -1 1 0 
“141421” 1.41421 ° 1.41421 


V3 0.70711, 0.70711, 0 


formulate V matrics with gained value fromthe result of normalization calculation of eigen vactor: 


— 046114 —0.26051 0.84822 


0.64197 0.56819 0.51479 
—0.70711 0.70711 0 


0.56819 —0.26051 0.70711 


0.64197 —0.46114 —0.70711 
VT= ) 
0.51479 0.84822 0 


search U matrix value with the formula of U= AVS-1: 


— 0.46114 —0.26051 0.84822 0 0.46641 0 


0.64197 0.56819 0.51479 0.23971 0 0 
—0.70711 0.70711 0 0 0 1 


= 

Il 
SSCCOORPRRRR HEEB 
SOTORRRORRR RE 
be hh Oh OOM 


0.04335 0.14351 1.36301 
0.04335 0.14351 1.36301 
—0.12615 0.47331 1.36301 
—0.12615 0.47331 1.36301 
—0.12615 0.47331 1.36301 
0.15389 0.26501 0.51479 
U=]| 0.04335 0.14351 1.36301 
—0.12615 0.47331 1.36301 
—0.11054 —0.12150 0.84822 


—0.16590 0.32980 0 
—0.16590 0.32980 0 
—0.16590 0.32980 0 
—0.16590 0.32980 0 


After being obtained the value of the USVT matrix, the next step is to reduce the rank of the matrix. This was 
done in order to reduce computing time. It is an example of a rank reduction of k = 2 from the USVT matrix 
as follows: 


follows: 


0.04335 0.14351 
0.04335 0.14351 
—0.12615 0.47331 
—0.12615 0.47331 
—0.12615 0.47331 
0.15389 0.26501 
Uk =| 0.04335 0.14351 
—0.12615 0.47331 
—0.11054 —0.12150 
—0.16590 0.32980 
—0.16590 0.32980 
—0.16590 0.32980 
—0.16590 0.32980 


Sk= Gees 0 ): Sk-1 =(023971 0 ) 


0 2.14403 0 0.46641 


0.64197 —0.46114 al 


0.64197 0.56819 
vs 0.5681 —0.26051 0.70711 


—0.46114 -0.26051 |; VkT = ( 
—0.70711 0.70711 


The last step is to calculate angle cosine value between document vactor (A) and input vactor (B) as 


Di = DiT Uk Sk-1 





Latent semantic analysis and cosine similarity for hadith search engine (Wahyudin Darmalaksana) 


224 O ISSN: 1693-6930 


0.04335 0.14351 
0.04335 0.14351 
0.12615 0.47331 
0.12615 0.47331 
0.12615 0.47331 
0.15389 0.26501 

Di-DiT-| 0.04335 0.14351 ( 
0.12615 0.47331 
—0.11054 —0.12150 
—0.16590 0.32980 
—0.16590 0.32980 
—0.16590 0.32980 
—0.16590 0.32980 


0.23971 0 ) 
0 0.46641 


DM = (—0.03970 0.57538) 
D1 = (0.64197 0.56819) 


D2 = (—0.46114 —0.26051) 
D3 = (—0.70711 0.70711) 








_ AB 
Cos a= Alle 
k (—0.03970) (0.64197) + (0.57538) (0.56819) 
Osa, = 
:  (—0.03970)? + (0.57538)2,/(0.64197)2 + (0.56819)? 
(—0.03970)(—0.46114) + (0.57538) (—0.26051) 
Cosa, = 








V (—0.03970)2 + (0.57538)2,/(—0.46114)2 + (—0.26051)? 
Cos a, = 0.71113 
Cos a; = 0.43739 


Cos a3 = 0.70542 


From the results of the above calculation, it can be concluded that the arangement of documents that have 
the closest similarity with the input documents is document 1, document 3, and document 2. 


3.3. Experiment and result evaluation 
Testing is caried out by trying all the hadith queries on the system. Recall and precision values are 
searched by using formulas (6) and (7) [38, 39]. 


Number of relevan items retrieved 


(6) 


~ Total number of relevanitems in collection 


_ Number of relevan items retrieved 


(7) 


Total number of items retrieved 


where, R is Recall, so the R value is obtained by comparing the Number of relevant items retrieved with 
the total number of relevant items in the collection. Recall is a document that is called from the system based 
on the user requests that follow the pattern of the system. The greater Recall value cannot be said as 
a good system or not. And, P is precision. So, the P value is obtained by comparing the number of relevant 
items retrieved with the Total number of items retrieved. Precision is the number of documents that are called 
from the relevant database after being assessed by the user with needed information. The greater the value of 
a system precision, the system can be said well. 

The purpose of the recall and precision test is to obtain information on the search results obtained by 
the system. Search results can be judged by its recall and precision level. Precision can be considered 
a measure of accuracy while recall is perfection. The value of precision is the level of accuracy between 
the information requested by the user and the answers given by the system. While the Recall value is the success 
level of the system in rediscovering information. As for the results of the recall and precision tests and 
the time which is spent on searching the tested hadith, it can be seen in Table 2, Figures 3 and 4. 
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B Appeared relevant data M Appeared irelevant data 


E The total number of relevant data 


Figure 3. Result of relevant information 


Precision and Recall Evaluation Result 


1 3 5 7 9 1113 1517 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 


Recall (96) 


=== Predsion (4) 


Figure 4. Result of precision and recall value 


Table 2. Tested result of latent semantics analysis and cosine similarity 





Keywords 


Appeared 
relevant 
Hadith 


Appeared 
irelevant 
Hadith 


The total number 
of relevant 


Hadith 


Recall 


(90) 


Precision 
(90) 





30 


31 


32 


33 


47 


48 


49 


50 


Jangan berdusta atas namaku masuk neraka 
(Don’t lie in behalf of my name to go to the hell) 
Mendirikan shalat menunaikan zakat dan 
berpuasa dibulan ramadlan (Cary out praying, 
alms and past in ramadan Month) 

Islam dibangun atas lima dasar yaitu 
persaksian, shalat, zakat, puasa dan ke 
baitullah (Islam was formed in five pilars 
namely; withness, praying, alms, pasting and 
pilgrimage to mecca ) 

Barangsiapa yang berpuasa dibulan ramadlan 
dengan keimanan dan ikhlas diampuni dosa- 
dosanya (Whoever fasts in the month of 
Ramadan with faith and sincerity is forgiven of 
his sins) 

Malu sebagian dari iman (Shame is part of 
faith) 

Aku pernah mandi bersama Nabi shallallahu 
'alaihi wasallam dari satu bejana, dan tangan 
kami saling bersentuhan (I had bathed with the 
Prophet sallallaahu 'alaihi wasallam from one 
vessel and our hands touched each other) 
Setiap Nabi memiliki doa yang dia panjatkan 
untuk umatnya (Every Prophet has a prayer that 
he prayed for his people) 

Jika datang haid tinggalkan shalat dan bila 
berakhir bersikan darah lalu shalatlah (If 
menstruation comes leave prayer and when it 
ends, clean bloody then pray) 

Tujuh puluh ribu orang dari umatku akan 
masuk surga, wajah mereka semua seperti 
rembulan (Seventy thousand of my people will 
go to heaven, their faces like the moon) 
Jadikanlah (sebagian dari) shalat kalian ada di 
rumah kalian dan jangan jadikan kuburan 
(Make (some of) your prayers in your house and 
do not make it a grave) 

Barangsiapa meninggal dalam keadaan 
menyekutukan Allah dengan sesuatu, maka ia 
masuk neraka (Whoever dies in a state that 
associates God with something, he goes to hell) 
Cukuplah seseorang (dianggap) berbohong 
apabila dia menceritakan semua (It is enough 
for someone (considered) to lie if he tells all) 
Seorang muslim yang paling baik adalah 
kambing yang digembalakannya di puncak 
gunung dan tempat-tempat terpencil (The best 
Muslim is the goat that he feeds on mountain 
tops and remote places) 
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4. CONCLUSION 

Based on 50 times testing of the recall and precision values that have been carried out (contained in 
Table 2), it showed that the search engine hadith performance can apply the latent semantics analysis algorithm 
and cosine similarity quite well. Hadith information which is obtained based on keywords, phrases, or 
sentences entered successfully found well, it was indicated by a recall value of 87.83%. Although the overall 
information which is generated only has a value of accuracy or compliance with user input only 36.25% which 
is indicated by the value of the produced precision. Generally, the latent semantics analysis algorithm and 
cosine similarity that are used are able to produce the hadith information well. There were several factors that 
influenced the search results other than the possibility of an error in using the algorithm, including incomplete 
data and too much noise. Therefore, the pre processing stage is very important to be able to produce more 
accurate information. Because the pre processing stage produces text data that gives an input into the latent 
semantics analysis algorithm which will certainly affect the search results. For further research, the collection 
of saved Hadith data needs to be completed so that search engines can learn and get more precised information. 
In addition, the information obtained can be developed not only sorted by similarity but also can be grouped 
according to their meanings. 
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